-
Notifications
You must be signed in to change notification settings - Fork 68
[triton-raise-block-pointer]: Introduce env. variable to ignore masked load/stores #3416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Note: this PR allows users to assert that the masks can be dropped (i.e. will always evaluate to true). This option is a stop-gap. I plan to work on mask analysis next. |
A next step in mask management could be to check whether masks ‘only’ avoid overflow (and whether they manage data flow) and to set the ‘ignore-mask’ flag if this is the case. I think the boundary control masks should be simple enough to be identified by the pass? |
I agree that there are broadly two cases of masks to distinguish between, but I don't think we should make that explicit in the code. Do the masks prevent the usage of block pointers or just block loads? Hypothetically, if we can use block ptrs with masked loads and fall back to the gather/scalar load, then we can develop heuristics to determine whether or not a 2D block load w/out masks is safe as an optimization step in the load lowering. |
The tt.load/tt.store operation do not accept a mask if the ptr operand is a blocked ptr. I modified tutorial 10 as:
And we then get a compilation error:
We may be able to "bypass" that error if we change the tt.load after the diagnostic fire (that is the user cannot legaly write that code but the compiler could). The first goal IMO is to determine whether the mask always evaluate to true (or false) for each loop iteration. In that case we can simply remove the mask (if true) or propagate 'other' (if the mask is always false). |
.Case<tt::LoadOp>( | ||
[this](auto loadOp) { return IgnoreMasks || !loadOp.getMask(); }) | ||
.Case<tt::StoreOp>([this](auto storeOp) { | ||
return IgnoreMasks || !storeOp.getMask(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two cases can likely be combined.
This makes sense because masked loads are not a block ptr option in the language - likely b/c computing the mask would be inefficient. If our goal is to use the block ptr machinery (namely 2d block load/store/prefetch) then do we need to internally lower to block ptr? If so we probably should respect the existing language conventions so we don’t diverge from upstream or have nasty surprises later. If we need to support the mask then maybe we need a different lowering strategy. |
When
tt.load
andtt.store
operations have a mask the compiler cannot safely "raise" them to use block pointers (block pointers load/stores are unmasked).This PR introduces a sub option (
ignore-masks
) for the env. variableTRITON_INTEL_RAISE_BLOCK_POINTER
. The suboption allows the compiler to rewrite masked load/stores into unmasked ones, before attempting conversion to block ptr load/stores.